Predicting EA Sports FIFA Team of the Season in Europe’s Five Leagues
The video game FIFA, which is developed by Electronic Arts (EA) Sports, has become the most popular sports video game in the world in recent years, largely due to its game mode called Ultimate Team. The objective of Ultimate Team is to build the best possible team through the buying and selling players, as well as purchasing packs of cards to replicate the process of buying soccer trading cards in real life. Each player receives ratings in various categories based on their real-life abilities, and each of these ratings factors into their overall rating. At the end of each season, EA Sports creates a Team of the Season (TOTS), where they select the best player at each position in each league from that season based on how they performed in real life. The players who receive TOTS cards also receive a boost to their overall rating to reflect their abilities in real life. Although most of their TOTS selections may appear predictable to fans, there are also occasionally perplexing selections that shock the fans. Along with this, EA has never explained how they make their choices. Through the use of machine learning methods and predictive modeling, we aim to determine which variables are most important when choosing a player for TOTS, as well as predict the Team of the Season for Europe’s top five leagues based on this season’s statistics.
Materials:
We retrieved complete player datasets for FIFA 17, FIFA 18, and FIFA 19 from Kaggle. We retrieved real life statistics from the 2016-2017, 2017-2018, and 2018-2019 seasons from fbref.com. We did not use data from the 2019-2020 season because COVID-19 caused each season to prematurely end in March of 2020.
Methods:
Using these datasets, we predicted TOTS players using a Random Forest machine learning model. Other models were tested, but we found that this method was the best. This creates many decision trees using the data to predict which players will be in the TOTS based upon the information that we feed into it. Then, it combines all of the trees together in order to make a decision on whether or not a player should be in the TOTS. We can then apply that model to data that it did not use in determining how to classify whether or not a player is in the TOTS. This allowed us to discover how good our model really was.
Revision: Whether the card is “Normal” or “Team of the Season (TOTS)”
Int : Interceptions
TklW : Tackles Won
OG : Own Goals
Pkcon : Penalties Conceded
MP: Matches Played
Min : Minutes
Gls : Goals
Ast: Assists
Non_Pk_G : Non Penalty Goals (Goals from Open Play or Free Kicks)
Pk: Penalty Kicks
Pkatt: Penalty Attempts
CrdY : Yellow Cards
CrdR : Red Cards
G_per90 : Goals per 90 minutes
A_per90 : Assists per 90 minutes
G_plus_A_per90 : Goals plus Assists per 90 minutes
G_minus_pk_per90 : Non Penalty Goals per 90 minutes
Rk : Table Position
GF : Goals For (Goals your team has scored)
GA : Goals Against (Goals your team has conceded)
GD : Goal Difference (GF-GA)
Pts : Team Points for the Season (3 for a win, 1 for a draw, 0 for a loss)
In this project, we worked with data from the five major global soccer leagues, Premier League, La Liga, Ligue 1, Bundesliga, and Serie A, to predict the TOTS for each league. Before digging into the individual evaluation of each league, we examined each of their averages to get an idea of their play styles. The table below exhibits that overall, the Premier League has the highest goal scoring environment, with the highest average number of both goals and assists. This could ultimately indicate that a higher threshold of offense will be required for a player at an offensive-oriented position to secure TOTS honors for the Premier League, as they are likely to face more fierce competition in terms of these counting stats. Meanwhile, La Liga and Serie A also boast relatively offensive environments, while Bundesliga and Ligue 1 are more defensive environments based on their more scarce frequency of goals.
| League | Goals | Assists | Non PK Goals | PK | Team Rank | Minutes Per 90 | Goals SD | Assists SD | Non PK Goals SD | Team Rank SD | Minutes Per 90 SD |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Premier League | 2.09 | 1.47 | 1.94 | 0.15 | 10.58 | 17.05 | 3.75 | 2.27 | 3.42 | 5.74 | 11.47 |
| La Liga | 2.05 | 1.41 | 1.85 | 0.20 | 10.60 | 16.80 | 3.89 | 2.04 | 3.46 | 5.82 | 10.62 |
| Ligue 1 | 1.95 | 1.30 | 1.75 | 0.21 | 10.49 | 16.94 | 3.67 | 1.95 | 3.17 | 5.77 | 11.09 |
| Bundesliga | 1.97 | 1.39 | 1.82 | 0.16 | 9.64 | 15.07 | 3.42 | 2.09 | 3.07 | 5.13 | 10.02 |
| Serie A | 2.05 | 1.35 | 1.85 | 0.19 | 10.61 | 16.51 | 3.73 | 2.06 | 3.32 | 5.77 | 11.07 |
The Premier League is widely considered the best league in the world, one full of tradition and history that has seen many dominant teams and outstanding players. In recent history, the league has been generally dominated by Manchester City and Liverpool, both of which won league titles by large margins. With the influx of foreign money into the league, the talent gap between the top and the bottom of the league has seen steady growth, but those at the bottom continue to make it competitive.
Before diving into modeling, we explored the data to observe basic trends. First, we looked at the proportion of Premier League cards that are given the TOTS designation. Below, we see that a select few cards are given the TOTS designation.
We also wanted to look at goals scored by TOTS players versus normal players. In this density plot, we are able to see that TOTS players score significantly more goals than regular players.
We also found that final table position and player card status were highly correlated, specifically that players with TOTS cards generally played for teams that finished higher in the table. In the past three years, each team of the season has generally been filled with many of the top teams’ players, and the density plot below reflects this.
Players who receive TOTS cards are usually the most important players to their teams, and because of this, play more minutes per contest. The density plot below is evidence of this fact.
Finally, TOTS distribution is expected to be vary from league to league, so it is important to look at the distribution specific to the Premier League. In the Premier League, the position with the highest number of TOTS cards is striker.
Before modeling the data, we split the data into training and testing sets. The training data is the data that we gave to the model to learn from, while the testing data is what we used to test our model. It is important that the Key Performance Indicators (KPIs) are similar in each dataset, as this indicates that the model that has learned from the training data is correctly being applied to the testing data.
| Revision | Type | Goals | Assists | Non PK Goals | PK | Team Rank | Minutes Per 90 | Goals SD | Assists SD | Non PK Goals SD | Team Rank SD | Minutes Per 90 SD |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Normal | Training | 3.055851 | 2.327128 | 2.781915 | 0.2739362 | 11.069149 | 26.94010 | 3.649960 | 2.360092 | 3.264401 | 5.455811 | 5.377800 |
| Normal | Testing | 3.200000 | 2.128000 | 2.872000 | 0.3280000 | 11.816000 | 27.45058 | 3.632292 | 2.094447 | 3.113384 | 5.492493 | 5.310808 |
| TOTS | Training | 8.942308 | 5.250000 | 8.269231 | 0.6730769 | 3.557692 | 31.76645 | 8.756876 | 4.167451 | 7.819321 | 3.268468 | 3.829581 |
| TOTS | Testing | 10.470588 | 7.352941 | 10.117647 | 0.3529412 | 4.058823 | 29.53987 | 8.768678 | 4.372373 | 8.388104 | 3.230143 | 4.999096 |
After seeing that our training and testing sets performed similarly, we created a random forest model to predict whether a player would be classified as TOTS or not. Our random forest model was made up of 100 decision trees. Each of these trees are uncorrelated, which helps provide stability and accuracy to the model. We also created a LASSO model, which filters out explanatory variables based on their importance to the outcome, for the training and testing data, however we found that the random forest was more accurate.
Using our random forest model, we were able to observe which variables were most important to our model. It appears that goals against each player’s team, minutes played, and matches played were the most important.
The confusion matrix below shows that 17 players were classified as TOTS. 7 of these players were correctly classified, while the model felt that 5 players who were not given TOTS cards should have been given one. It also felt that 10 players who were given TOTS cards should not have been given one.
Finally, we applied our model to the Premier League stats from the 2020-2021 season. The players who were chosen for TOTS are shown below.
La Liga has been dominated for many years by Barcelona and Real Madrid, two of the most storied clubs in the world. For the past decade it has been the story of Leo Messi vs. Cristiano Ronaldo, best vs. best. These two clubs have won the most Champions League trophies in the last decade and it is rare that one of them does not win the league. Outside of those two clubs the league somewhat struggles for talent, especially defensively, but the gap has seen some closing in the last few years.
The first exploratory plot we looked at was the number of TOTS vs Normal cards. Once again, see there are not many TOTS players in the data set.
Then, we looked at goals scored by TOTS and normal players. While both of the densities are low, TOTS players tend to score more goals.
Next, we created a density plot of table position for TOTS players and normal players. TOTS player tended to finish higher in the table.
We then, created a density plot of minutes played for the TOTS players vs the normal players. Much like in the Premier League, TOTS players played more.
Finally, the last exploratory plot shows the distribution of the positions. There are not many center forwards, so they have been converted to strikers.
| Revision | Type | Goals | Assists | Non PK Goals | PK | Team Rank | Minutes Per 90 | Goals SD | Assists SD | Non PK Goals SD | Team Rank SD | Minutes Per 90 SD |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Normal | Training | 2.817416 | 2.264045 | 2.539326 | 0.2780899 | 10.808989 | 26.28998 | 3.531401 | 2.079531 | 3.137438 | 5.438779 | 4.568289 |
| Normal | Testing | 3.355932 | 2.177966 | 3.050848 | 0.3050847 | 10.576271 | 26.12580 | 3.938687 | 2.205786 | 3.458787 | 5.441855 | 4.913905 |
| TOTS | Training | 10.333333 | 5.062500 | 8.833333 | 1.5000000 | 5.041667 | 29.98079 | 9.800854 | 3.628529 | 8.706401 | 4.980636 | 3.785520 |
| TOTS | Testing | 7.533333 | 4.533333 | 6.533333 | 1.0000000 | 3.266667 | 28.81852 | 10.232069 | 3.907258 | 9.287985 | 3.473711 | 3.411219 |
The plot below show the importance of the variables in the La Liga model, the most important stats are “Minutes Played”, “Assists” and “Team Goal Differential”.
The confusion matrix below shows us that we predicted 9 TOTS correctly, 114 normal cards correctly, and 10 total incorrectly in the testing data for La Liga.
Below is the predicted team of the season for La Liga during the 2020-2021 season.
Generally considered the worst of the Top 5 European leagues, Ligue 1 has been dominated by PSG for many years. Often called a “farmer’s league”, Ligue 1 is sometimes not even considered among the best leagues in the world. However, there is no doubt that PSG is one of the best teams in the world. With the likes of Kylian Mbappe and Neymar, they managed to make it to the Champions League Final last season, and most recently lost in the semi-final this year.
We began with exploratory plots once again. The first plot showed us how many players were given TOTS cards. Once again, only a small proportion of players were given TOTS cards.
Next, we looked at the density of goals scored between regular players and TOTS players. We were able to see that in general, a larger proportion of TOTS players scored a higher number of goals.
Next, we looked at the density of table position by card type. We see that there was an even density of table position for normal cards, while the majority of TOTS players played for better teams.
We then looked at the density of minutes played per match and, unsurprisingly, players who were given TOTS cards played more minutes per contest.
Finally, we looked at the distribution of TOTS cards by position. We were able to see that there is an overwhelming number of strikers and center backs in Ligue 1.
We also evaluated the metrics between the training and testing data to see if there was a significant difference between the two. For Ligue 1, there was not a significant difference in any of the important columns.
| Revision | Type | Goals | Assists | Non PK Goals | PK | Team Rank | Minutes Per 90 | Goals SD | Assists SD | Non PK Goals SD | Team Rank SD | Minutes Per 90 SD |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Normal | Training | 2.874652 | 2.172702 | 2.532033 | 0.3426184 | 10.899721 | 26.99214 | 3.531724 | 2.076264 | 2.992253 | 5.400287 | 5.060310 |
| Normal | Testing | 2.613445 | 1.714286 | 2.411765 | 0.2016807 | 11.747899 | 26.66788 | 3.447203 | 1.790478 | 3.179414 | 5.642182 | 5.015616 |
| TOTS | Training | 8.958333 | 4.666667 | 7.583333 | 1.3750000 | 4.062500 | 28.75231 | 8.310358 | 3.610466 | 7.040410 | 4.107110 | 4.788361 |
| TOTS | Testing | 10.200000 | 4.533333 | 8.666667 | 1.5333333 | 2.666667 | 28.76444 | 10.516654 | 3.481926 | 8.829065 | 1.799471 | 4.595155 |
In this model, the most important variables were minutes played, goal differential, and goals plus assists per 90 minutes. These three variables contributed to the card classification significantly more than the other variables.
Overall, this model predicted that 19 players met our criteria to be selected for team of the season, while also misclassifying 16 players.
Finally, the Ligue 1 TOTS is shown below.
Considered the league of the people due to its rule of forcing every club to be 51% fan owned, the German Bundesliga is considered the second best defensive league behind the Premier League. Bayern Munich have dominated the league for many years, often poaching the best players from other teams in the league.
First, we looked at how many TOTS players there were vs normal players in our Bundesliga data set. Once again, TOTS was not given to many players.
Next, we looked at a density of goals scored. The top for both TOTS and normal cards was fairly low, but the TOTS tended to score more.
Next, we created a density plot of the table position of TOTS vs normal players. As with the other leagues, the TOTS players tended to play for better teams.
Subsequently, we created a density plot of the minutes played of the normal cards vs TOTS cards, and the team of the season players played much more.
Lastly, we plotted a distribution of the positions and which positions got TOTS cards. There are not many wing players in the Bundesliga, so they were converted to left and right midfielders.
| Revision | Type | Goals | Assists | Non PK Goals | PK | Team Rank | Minutes Per 90 | Goals SD | Assists SD | Non PK Goals SD | Team Rank SD | Minutes Per 90 SD |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Normal | Training | 2.907143 | 2.150000 | 2.682143 | 0.2250000 | 9.753571 | 25.36694 | 3.480404 | 2.017575 | 3.146774 | 4.968718 | 3.889732 |
| Normal | Testing | 2.387097 | 2.043011 | 2.172043 | 0.2150538 | 10.279570 | 24.69164 | 3.003738 | 2.004961 | 2.619424 | 4.583443 | 3.575467 |
| TOTS | Training | 9.583333 | 6.062500 | 8.583333 | 1.0000000 | 4.458333 | 27.66389 | 7.601325 | 3.732584 | 6.772389 | 3.439188 | 3.976717 |
| TOTS | Testing | 6.937500 | 4.750000 | 5.875000 | 1.0625000 | 4.937500 | 26.58889 | 7.196932 | 4.297286 | 5.806605 | 3.923752 | 4.262849 |
Below is the variable importance plot for the Bundesliga model. The most important variables are “Minutes Played”, “Goals Against (Team)” and “Non Penalty Goals plus Assists per 90 Minutes”.
The confusion matrix below details the accuracy of the predictions for the testing data. 7 TOTS were predicted correctly, 85 normal cards were predicted correctly, and 17 total cards were predicted incorrectly.
Finally, below is the predicted team of the season players in the Bundesliga:
The Serie A has one of the richest histories in Europe, with the likes of AC Milan, Inter Milan, and Juventus all having great success. However, in recent history the league has been completely dominated by Juventus with them winning 9 titles in a row before being stopped this year by Inter.
First, we made a bar chart to see the number of TOTS players in Serie A.
Next we created a density plot of goals scored. TOTS players scored slightly more goals than normal players.
Then, we made a density plot of team rank of the TOTS players vs normal players. TOTS players finished much higher in the table.
Next, we made a distribution plot of how much the TOTS players played vs normal players. The TOTS players played much more than the normal players.
Finally, we made a plot of the positional breakdown of all the players. The distribution of the players is heavily in center backs, center midfielders, and strikers.
Next we made a table to compare important stats for the training and testing data. There was no significant difference between the training and testing metrics.
| Revision | Type | Goals | Assists | Non PK Goals | PK | Team Rank | Minutes Per 90 | Goals SD | Assists SD | Non PK Goals SD | Team Rank SD | Minutes Per 90 SD |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Normal | Training | 3.077994 | 2.222841 | 2.779944 | 0.2980501 | 11.000000 | 26.89774 | 3.696524 | 2.217360 | 3.290249 | 5.392422 | 4.849271 |
| Normal | Testing | 3.159664 | 2.521008 | 2.873950 | 0.2857143 | 9.974790 | 27.16788 | 3.347104 | 2.295348 | 3.147651 | 5.812701 | 4.795889 |
| TOTS | Training | 10.037736 | 5.377358 | 8.886793 | 1.1509434 | 4.603774 | 29.95765 | 8.257775 | 3.685869 | 7.135125 | 3.465987 | 4.914517 |
| TOTS | Testing | 10.352941 | 3.529412 | 9.294118 | 1.0588235 | 2.941177 | 28.31634 | 9.191988 | 2.741296 | 7.605629 | 2.045440 | 5.374223 |
Below is a plot of the most important variables in our Serie A model. “Minutes Played”, “Tackles Won”, and “Assists” were the most important variables.
We also created a confusion matrix of the predictions and true values for the testing data. We predicted 9 TOTS players correctly, and 10 incorrectly. While this is not great, this result is good enough because the predicted probabilities are ordered fairly well.
Finally, here is the predicted TOTS for Serie A this year.
Here we show how Manchester City midfielder Kevin De Bruyne would be modeled in all the different leagues had he played in them. This demonstrates the similarities and differences between the models, and the subtle differences between each league.
In all of the leagues he performs well. However, the Bundesliga model favors assists as a more important stat, while La Liga and Ligue 1 favor goal differential as more important stats. Overall, each league favors certain stats in terms of importance. Because of this, we were unable to create a model for all players across all leagues.
In conclusion, we found that this is something that is very hard to predict. Our models did not predict the binary of TOTS or not exactly, but they did seem to order the predicted probabilities well. The stats that our models favored most were how well the player’s team performed during each season, and how much the player played. The models also used other stats fairly effectively, but they struggled to predict players that played well on worse teams. Therefore, these models likely could not be used for much other than proving that much of what EA Sports does is subjective in terms of picking who gets these cards. Building these models confirmed our suspicion that there is no exact method to EA’s madness. An interesting implication of this could be how getting or not getting one of these cards affects the public’s perception of the player. Are there players that should be more highly rated by soccer fans, but are not due to them not getting a TOTS card? This is one of the many possible connections to EA’s choices for TOTS.